Picture for Wei Lin

Wei Lin

Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University, China

Are Full Rollouts Necessary for On-Policy Distillation?

Add code
Jun 01, 2026
Viaarxiv icon

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay

Add code
May 27, 2026
Viaarxiv icon

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards

Add code
May 25, 2026
Viaarxiv icon

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

Add code
May 20, 2026
Viaarxiv icon

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Add code
May 18, 2026
Viaarxiv icon

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Add code
May 18, 2026
Viaarxiv icon

RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

Add code
May 12, 2026
Viaarxiv icon

Birds of a Feather Cluster Nearby: a Proximity-Aware Geo-Codebook for Local Service Recommendation

Add code
Apr 25, 2026
Viaarxiv icon

MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches

Add code
Apr 24, 2026
Viaarxiv icon

SPREG: Structured Plan Repair with Entropy-Guided Test-Time Intervention for Large Language Model Reasoning

Add code
Apr 20, 2026
Viaarxiv icon